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Title: DISTRIBUTION THEORY BASED ENRICHMENT OF SPARSE DATA FOR MACHINE LEARNING 



IN THE CLAIMS 

The pending claims are set forth as follows: 

1 . (Original) A computer-implemented method for enriching sparse data for machine 
learning, comprising: 

receiving the sparse data; 

enriching the received data around a deviation of the mean of the received data using a 
predetermined distribution; and 

outputting the enriched data for unbiased learning and improved performance during the 
machine learning. 

2. (Original) The method of claim 1, wherein machine learning comprises: 
supervised artificial neural network learning. 

3. (Original) The method of claim 1, further comprising: 
checking the received data for sparseness; and 

enriching the checked data around the deviation of the mean of the received data based 
on the outcome of the checking. 

4. (Original) The method of claim 1, wherein checking the received data further comprises: 
comparing the received data with a predetermined number. 

5. (Original) The method of claim 4, wherein enriching the received data further comprises: 
enriching the received data around the deviation of the mean of the received data based 

on the outcome of the comparison. 

6. (Original) The method of claim 1, further comprising: 
rearranging the received data based on class. 
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7. (Original) The method of claim 6, further comprising: 

normalizing the rearranged data based on attributes in the rearranged data. 



8. (Original) The method of claim 6, further comprising: 

checking each class of data in the rearranged data for sparseness; and 
enriching each class of data around a deviation of the mean associated with the respective 
class based on the outcome of the checking. 

9. (Original) The method of claim 8, wherein checking each class of data further 
comprises: 

comparing each class of data to a predetermined number. 

10. (Original) The method of claim 9, wherein enriching each class of data comprises: 
enriching each class around a deviation of the mean associated with the respective class 

based on the outcome of the comparison. 

11. (Original) The method of claim 10, wherein enriching each class around a deviation of 
the mean associated with the respective class further comprises: 

computing the mean and standard deviation for each class of data in the rearranged data; 

and 

generating additional data for each class using the associated computed mean and 
standard deviation. 

12. (Original) The method of claim 1 1, wherein generating additional data further 
comprises: 

generating additional data between limits computed using the equation: 
x ± ka 

wherein x is the computed mean associated with each class, k is a constant varying 
between 0.25 to 3, and a is the computed standard deviation associated with each class. 
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13. (Original) The method of claim 12, wherein the predetermined distribution further 
comprises: 

arranging the enriched data using the equation: 

[X mn J fWJ=[BJ 

wherein W is a weight matrix, X is input patterns, and B{s are the classes; and 
rearranging in the max-min-max pattern: 
Let for class / 

(Rxin - RX2n) > (R*2N - R*3n) > . . . >(RX(i-})N -RXitf) = 

(Rx( i+ 2)N- Rx(}+j)n) < (Rx(i+3) - Rx(i+2)) < . . < (Rxan ' RX(A-l)N) 
where Rxw ^ Row xjm are enriched data values. 

14. (Original) The method of claim 1, wherein the predetermined distribution comprises 
distributions selected from the group consisting of normal distribution, exponential distribution, 
logarithmic distribution, chi-square distribution, t-distribution, and F-distribution. 

15. (Original) The method of claim 1, wherein the received data comprises data selected 
from the group consisting of static data and real-time data. 

16. (Original) The method of claim 15, further comprising: 

if the received data is static data, then reading a sample of the received static data using a 
predetermined window length; and 

if the received data is real-time data, then reading a sample of the received real-time data 
using a dynamically varying window of predetermined window length. 

17. (Original) The method of claim 16, further comprising: 

if the received data is real-time data, then repeating the reading of the sample of the 
received real-time data using a dynamically varying window of predetermined window length. 

1 8. (Original) A computer readable medium having computer-executable instructions for 
performing a method of machine learning when only sparse data is available, comprising: 
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enriching the sparse data around a deviation of the mean of the received data using a 
predetermined distribution; and 

outputting the enriched data for unbiased machine learning. 

19. (Original) The computer readable medium of claim 18, wherein machine learning 
comprises: 

supervised artificial neural network learning. 

20. (Original) The computer readable medium of claim 18, further comprising: 
checking the received data for sparseness; and 

enriching the received data around the deviation of the mean of the received data based 
on the outcome of the checking. 

2 1 . (Original) The computer readable medium of claim 1 8, wherein checking the received 
data further comprises: 

comparing the received data with a predetermined number. 

22. (Original) The computer readable medium of claim 21, wherein enriching the received 
data further comprises: 

enriching the received data around the deviation of the mean of the received data based 
on the outcome of the comparison. 

23. (Original) The computer readable medium of claim 18, further comprising: 
rearranging the received data based on class. 

24. (Original) The computer readable medium of claim 23, further comprising: 
normalizing the rearranged data based on attributes in the rearranged data. 



25. (Original) The computer readable medium of claim 23, further comprising: 
checking each class of data in the rearranged data for sparseness; and 
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enriching each class of data around a deviation of the mean associated with the respective 
class based on the outcome of the checking. 



26. (Original) The computer readable medium of claim 25, wherein checking the each class 
of data further comprises: 

comparing each class of data to a predetermined number. 

27. (Previously Presented) The computer readable medium of claim 26, wherein enriching 
the each class of data comprises: 

enriching each class around a deviation of the mean associated with the respective class 
based on the outcome of the comparison. 

28. (Original) The computer readable medium of claim 27, wherein enriching each class 
around a deviation of the mean associated with the respective class further comprises: 

computing the mean and standard deviation for each class of data in the rearranged data; 

and 

generating additional data for each class using the associated computed mean and 
standard deviation. 



29. (Original) The computer readable medium of claim 28, wherein generating additional 
data further comprises: 

generating additional data between limits computed using the equation: 
x ± ka 

wherein x is the mean associated with each class, A: is a constant varying between 0.25 
to 3, and a is the standard deviation associated with each class. 

30. (Original) The computer readable medium of claim 29, wherein the predetermined 
distribution further comprises: 

arranging the enriched data using the equation: 
[X mr J [W]=[BJ 
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wherein W is a weight matrix, X is input patterns, and Bi f s are the classes; and 
rearranging in the max-min-max pattern: 
Let for class / 

(RXjn ~ RX2n) > (R*2N ~ RX3n) > . . . > (RX(i-I)N - R^in) = 
(RX(i+2)N~ RX(i+l)N) < (RK(i+3) ' R*(i+2)) < ■ ■ < (RXAN ' RX(A-I)n) 

where Rxw ^ Row xjn. 

wherein Rxin is the first row of the Xth (reference) class consisting oiN features, Rx2N is 
the second row of the Xth (reference) class consisting of TV features, and so on. 

31. (Original) The method of claim 18, wherein the predetermined distribution comprises 
distributions selected from the group consisting of normal distribution, exponential distribution, 
and logarithmic distribution. 



32. (Original) The computer readable medium of claim 18, wherein the received data 
comprises data selected from the group consisting of static data and real-time data. 

33. (Original) The computer readable medium of claim 32, further comprising: 

if the received data is static data, then reading a sample of the received static data using a 
predetermined window length ; and 

if the received data is real-time data, then reading a sample of the received real-time data 
using a dynamically varying window of predetermined window length. 

34. (Original) The computer readable medium of claim 33, further comprising: 

if the received data is real-time data, then repeating the reading of the sample of the 
received real-time data using a dynamically varying window of predetermined window length. 

35. (Original) A computer system for a machine learning in a sparse data environment, 
comprising: 

a storage device; 

an output device; and 
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a processor programmed to repeatedly perform a method, comprising: 
receiving the data; 

enriching the received data around a deviation of mean of the received data using 
a predetermined distribution; and 

outputting the enriched data for unbiased machine learning. 

36. (Original) The system of claim 35, wherein machine learning comprises: 
supervised artificial neural network learning. 

37. (Original) The system of claim 35, further comprising: 
rearranging the received data based on class. 

38. (Original) The system of claim 37, further comprising: 

normalizing the rearranged data based on attributes in the rearranged data. 

39. (Original) The system of claim 37, further comprising: 

checking each class of data in the rearranged data for sparseness; and 
enriching each class of data around a deviation of the mean associated with the respective 
class based on the outcome of the checking. 

40. (Original) The system of claim 39, wherein checking the each class of data further 
comprises: 

comparing each class of data to a predetermined number. 

41. (Original) The system of claim 40, wherein enriching each class of data comprises: 
enriching each class around a deviation of mean associated with the respective class 

based on the outcome of the comparison. 

42. (Original) The system of claim 41, wherein enriching the each class around a deviation 
of mean associated with the respective class further comprises: 
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computing the mean and standard deviation for each class of data in the rearranged data; 

and 

generating additional data for each class using the associated computed mean and 
standard deviation. 

43. (Original) The system of claim 42, wherein generating additional data further comprises: 
generating additional data between limits computed using the equation: 

x ± k<r 

wherein x is the mean associated with each class, k is a constant varying between 0.25 
to 3, and a is the standard deviation associated with each class. 

44. (Original) The system of claim 35, wherein the predetermined distribution comprises 
distributions selected from the group consisting of normal distribution, exponential distribution, 
and logarithmic distribution. 

45. (Original) A computer-implemented system for machine learning in a sparse data 
environment, comprising: 

a receive module to receive sparse data; 

an analyzer to enrich the received data around a deviation of the received data using a 
predetermined distribution; and 

an output module coupled to the analyzer to output the enriched data for unbiased 
learning and increased performance during machine learning. 

46. (Original) The system of claim 45, further comprising: 

a database coupled to the receive module to receive and store sparse data. 

47. (Original) The system of claim 45, wherein machine learning comprises: 
supervised artificial neural network learning. 



48. (Original) The system of claim 45, further comprising: 
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a comparator coupled to the analyzer to check the received data for sparseness, wherein 
the analyzer enriches the checked data around the deviation of the mean of the received data 
based on the outcome of the checking. 

49. (Original) The system of claim 48, wherein the comparator checks the received data for 
sparseness by comparing the received data with a predetermined number. 

50. (Original) The system of claim 49, wherein the analyzer enriches the received data 
around the deviation of the mean of the received data based on the outcome of the comparison. 

5 1 . (Original) The system of claim 50, wherein the analyzer rearranges the received data 
based on class. 

52 (Original) The system of claim 51, wherein the analyzer normalizes the rearranged data 
based on attributes in the data. 

53. (Original) The system of claim 51, wherein the analyzer checks each class of data for 
sparseness, and enriches each class around a deviation of the mean associated with the class 
based on the outcome of the checking by the analyzer. 

54. (Original) The system of claim 53, wherein the comparator compares each class in the 
rearranged data with a predetermined number and wherein the analyzer enriches each class 
around a deviation of the mean associated with the class based on the outcome of the comparison 
by the comparator. 

o 

55. (Original) The system of claim 54, wherein the analyzer enriches data in each class by 
computing a mean and standard deviation for each class in the rearranged data, and the analyzer 
further generates additional data for each class based on the respective computed mean and 
standard deviation. 
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56. (Original) The system of claim 55, wherein the analyzer generates additional data 
between limits computed using the equation: 

x ± kcr 

wherein x is the mean associated with each class in the rearranged data, A: is a constant 
varying between 0.25 to 3, and a is the standard deviation associated with each class in the 
rearranged data. 

57. (Original) The system of claim 56, wherein the analyzer further computes additional data 
using the equation: 

[X mt J [W]=[BJ 

wherein W is a weight matrix, X is input patterns, and B t f s are the classes; and 
rearranging in the max-min-max pattern: 
Let for class i 

(Rx 1n -Rx 2 n) > (Rx 2 n-Rx3n) > • • >(Rx (i . } ) N -Rx iN ) = 

(RX(i+2)N" RX(i+l)N) < (RXfl+3) - Rx (i +2)) < • • < (Rx A N ~ RX(A-I)n) 

where Rxin ^Rowxjn. 

58. (Original) The system of claim 45, wherein the received data comprises data selected 
from the group consisting of static data and real-time data. 

59. (Original) The system of claim 58, further comprising: 

a reading module coupled to the receive module reads a sample of the received data 
having a predetermined window length. 



60. (Original) The system of claim 59, wherein the reading module reads the sample of the 
received data using a predetermined window length when the read data is static data, and reads a 
sample of the received data using a dynamically varying window of predetermined window 
length when the read data is real-time data. 
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61 . (Original) The system of claim 60, wherein the reading module repeats the reading of the 
sample of the received data using a dynamically varying window of predetermined window 
length when the received data is real-time data. 

62. (Previously Presented) The system of claim 45, further comprising: 

a database coupled to the receive module to receive and store sparse data; and 

a unique numeric transformation module coupled to the database to extract words from 

text stored in the database and to transform each of the extracted words into a unique numerical 

representation. 



